Skip to content

Conversation

@rok
Copy link
Member

@rok rok commented Dec 22, 2025

Rationale for this change

This is the first in series of PRs adding type annotations to pyarrow and resolving #32609.

What changes are included in this PR?

This PR establishes infrastructure for type checking:

  • Adds CI workflow for running mypy, pyright, and ty type checkers on linux, macos and windows
  • Configures type checkers to validate stub files (excluding source files for now)
  • Adds PEP 561 py.typed marker to enable type checking
  • Updates wheel build scripts to include stub files in distributions
  • Creates initial minimal stub directory structure
  • Updates developer documentation with type checking workflow

Are these changes tested?

No. This is mostly a CI change.

Are there any user-facing changes?

This does not add any actual annotations (only py.typed marker) so user should not be affected.

Copy link
Member

@AlenkaF AlenkaF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @rok for working on this! Splitting up the main PR sounds like a great idea to me.

I only have one small question, otherwise the changes look good to me.

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Dec 23, 2025
@rok
Copy link
Member Author

rok commented Dec 23, 2025

Thanks for review @AlenkaF ! I think we probably want @raulcd to look at this too before merging?

@AlenkaF
Copy link
Member

AlenkaF commented Dec 23, 2025

Yes, let's wait for Raul 👍

@github-actions github-actions bot added awaiting change review Awaiting change review awaiting changes Awaiting changes and removed awaiting changes Awaiting changes awaiting change review Awaiting change review labels Dec 23, 2025
Copy link
Member

@raulcd raulcd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @rok sorry it took me a while to review with holidays, release, etcetera

@raulcd
Copy link
Member

raulcd commented Jan 13, 2026

@github-actions crossbow submit wheel-manylinux-2-28-cp311-cp311-amd64

@github-actions
Copy link

Revision: 28fba3a

Submitted crossbow builds: ursacomputing/crossbow @ actions-8a459250fe

Task Status
wheel-manylinux-2-28-cp311-cp311-amd64 GitHub Actions

@github-actions github-actions bot added awaiting change review Awaiting change review awaiting changes Awaiting changes and removed awaiting changes Awaiting changes awaiting change review Awaiting change review labels Jan 14, 2026
compose.yaml Outdated
ARROW_S3: "OFF"
ARROW_SUBSTRAIT: "OFF"
ARROW_WITH_OPENTELEMETRY: "OFF"
PYARROW_TEST_ANNOTATIONS: "ON"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@raulcd do you think we actually want to be able to toggle these? I'm thinking we don't really need them

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think is worth it if instead of using a separate ci/scripts/python_test_type_annotations.sh we use ci/scripts/python_test.sh and we use the PYARROW_TEST_ANNOTATIONS variable to manage whether we want annotations to be tested or not as part of python testing.
If we decide we want to have two separate scripts I think this env var should be removed.
I lean towards having a single python test script that can allow us to also test annotations but I can find arguments for both so not an issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, agreed with your reasoning. I also introduced PYARROW_TEST_ANNOTATIONS to the windows script.

@rok
Copy link
Member Author

rok commented Jan 14, 2026

Thanks for review @raulcd ! I've responded to your points.

compose.yaml Outdated
ARROW_S3: "OFF"
ARROW_SUBSTRAIT: "OFF"
ARROW_WITH_OPENTELEMETRY: "OFF"
PYARROW_TEST_ANNOTATIONS: "ON"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think is worth it if instead of using a separate ci/scripts/python_test_type_annotations.sh we use ci/scripts/python_test.sh and we use the PYARROW_TEST_ANNOTATIONS variable to manage whether we want annotations to be tested or not as part of python testing.
If we decide we want to have two separate scripts I think this env var should be removed.
I lean towards having a single python test script that can allow us to also test annotations but I can find arguments for both so not an issue.

Comment on lines 178 to 183
# We first populate stub docstrings and then build the wheel
python setup.py build_ext --inplace
python -m pip install griffe libcst
python ../dev/update_stub_docstrings.py pyarrow-stubs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not necessary, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as #48618 (comment). It's a noop and we don't have to introduce python logic in stub PRs if we do here.

Comment on lines 138 to 140
@REM We first populate stub docstrings and then build the wheel
%PYTHON_CMD% setup.py build_ext --inplace
%PYTHON_CMD% -m pip install griffe libcst
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Until we have real annotations this isn't necessary, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added running dev\update_stub_docstrings.py, so I suppose it's a noop now and we can keep it to make sure it doesn't fail before introducing actual stubs? This will keep CI logic out of stub PRs.

@rok rok force-pushed the pyarrow-stubs-pr1-infrastructure branch from 5ce9011 to cebbfcd Compare January 21, 2026 14:10
@rok
Copy link
Member Author

rok commented Jan 22, 2026

@kou could you take a quick look at this?

Comment on lines 242 to 246
- name: Test annotations
shell: bash
env:
PYARROW_TEST_ANNOTATIONS: "ON"
run: ci/scripts/python_test_type_annotations.sh $(pwd)/python
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't object this approach but it may be better that we use pre-commit for type checking. (I think that type checking is one of lint checks.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a nice idea! So instead of a new workflow step you are proposing we move these checks into pre-commit script, correct? I'll change to pre-commit approach.

One thing I wonder about - are there users that run all these checks locally and would not like the new extra jobs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wait, we need compiled pyarrow to run type annotation checks against and that is not practical for pre-commit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's too bad. I reverted back to workflow approach, but only put the annotation job into a single workflow now.

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Jan 23, 2026
Comment on lines 18 to 26
"""Type stubs for PyArrow.
This is a placeholder stub file.
Complete type annotations will be added in subsequent PRs.
"""

from typing import Any

def __getattr__(name: str) -> Any: ...

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC this isn't intended to be permanent.
But I thought you might want to consider adding a note/opening an issue to remove it before release.

Here's some examples of how it can mask runtime errors:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, interesting I didn't consider this one! Yes this is meant as a temporary and we would want it out before the release. I'll make a note and issue, thanks for pointing it out!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Jan 23, 2026
@github-actions github-actions bot added awaiting change review Awaiting change review awaiting changes Awaiting changes and removed awaiting changes Awaiting changes awaiting change review Awaiting change review labels Jan 23, 2026
@github-actions github-actions bot added awaiting change review Awaiting change review awaiting changes Awaiting changes and removed awaiting changes Awaiting changes awaiting change review Awaiting change review labels Jan 23, 2026
@rok
Copy link
Member Author

rok commented Jan 23, 2026

Docs failure is from the Pandas issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants